In this file, we import the already clean data (where we’ve also made imputations for missing values) containing all individual input variables. Then, we import the parameters of the best performing machine-learning model that we can use to predict future happiness. Before we generate the predictions, we apply a series of country-level regression models that project the historical values on to the next two years, thus providing the inputs we need for applying our machine learning model.


Setting things up

Importing relevant packages, defining custom functions, specifying local folders etc.

# Importing relevant packages

# For general data-related tasks
library(plyr)
library(tidyverse)
library(data.table)
library(openxlsx)
library(readxl)
library(arrow)
library(zoo)

# For statistical analysis and ML
library(modelr)
library(randomForest)

# For data visualization
library(plotly)
library(ggplot2)
library(gridExtra)


Importing data

Below, we import historical data on happiness and various background variables, where imputations for missing data has already been performed. In addition, we import the parameters we need for fitting the best performing machine learning model (as evidenced by our tests in the ML_modelling.Rmd notebook).


User input

Here, we specify how many years into the future to generate predictions for. It should be noted that the farther into the future we go, the less certain the predictions become.

## [1] "Note: predictions will be generated for the years 2023-2024."


We also specify whether to test all possible models for the data used as input to our machine learning model or whether to import the results of a previous test.

## [1] "Results from previously model tests will be imported."
## [1] "Note: this assumes that we're using more or less the same input data."


Preparing inputs for ML model

Before we can proceed to fitting our random forest model, we need to make sure that we have input data in future time periods. Unfortunately, such data is not directly available, however, very good approximations can be obtained by projecting the trends found in the historical data to future time periods.

Defining which variables to use in the ML model

The decision of exactly which variables to include in our predictive model(s) is based on our findings from the Ridge_regression_analysis.Rmd notebook, where we explored different combinations of variables.

A full list of all input variables fed into the models tested below is presented in here:


##  [1] "E_GDPPerCapitaConstant"          "E_PovertyGap685Headcount"       
##  [3] "E_HealthExpenditurePerCapita"    "E_ExportsPctOfGDP"              
##  [5] "E_GiniIndex"                     "E_EducationExpenditurePctOfGDP" 
##  [7] "E_LaborTaxPctOfProfits"          "E_ConsumerPriceInflation"       
##  [9] "E_FemaleUnemployment"            "E_ImportsPctOfGDP"              
## [11] "P_GovernmentEffectiveness"       "P_ControlOfCorruption"          
## [13] "P_RuleOfLaw"                     "P_CleanElectionsIndex"          
## [15] "S_AccessToCleanFuelsPctOfTotal"  "S_UrbanPopPctOfTotal"           
## [17] "S_AccessToElectricityPctOfTotal" "S_UpperSecEduPctOfTotal"        
## [19] "S_CompulsoryEducationYears"      "S_LaborParticipRateFemale"      
## [21] "S_PopulationAged14OrLess"        "V_CO2PerCapita"                 
## [23] "V_FertilizerUseKgPerHectare"     "V_AgriculturalLandPctOfTotal"   
## [25] "V_ForestAreaPctOfTotal"          "H_AirPollutionMeanExpPctOfPop"  
## [27] "H_TotalSuicideRate"


Generating country-level inputs for ML model

The way we go about projecting historical trends into the future is described below:

  1. We take each individual variable for each individual country and fit several different regression models on it, then record the model fit metrics for each model.
  2. We select the model with the lowest MAPE score and use it to generate predictions.
  3. We loop through all countries and variables used as inputs in the ML model and consolidate the output in a single data frame.


Applying ML model

Generating predictions

Finally, we import our pre-optimized random forest model and use it to generate predictions for the period 2023-2024.

## [1] "Successfully generated predictions for 298 rows of data."
## [1] "Clean dataset containing history and predictions exported to 'Data/Output'."


Top 10 happiest countries in 2024

First, we would like to see what the happiest countries are based on their last predicted annual scores:


As was the case in historical data, we see a lot of Nordic and European countries among the happiest countries in the world; in fact, among the top 10 happiest in 2024, we only see European countries.


The 10 least happy countries in 2024

Looking at the least happy countries, the predicted happiness scores are not surprising either given the historical background:


Similarly to what we observed in the historical data, we see that a lot of countries suffering from internal/armed conflicts like Afghanistan, Lebanon and Yemen are expected to retain their relatively low levels of happiness even in 2024.


Happiness around the world in 2024

To be better able to compare happiness across the globe, we create an interactive color-coded world map where each country will get its own happiness score plotted with a different color shade. Unfortunately, we do not have data for all countries, so some states will be colored white due to missing observations.


Conclusion

In this notebook, we’ve demonstrated how we can use a predictive ML model to estimate the expected level of happiness in the future. In order to do so, we first needed to generate input data for future time periods that we can feed into the model. This was done at the country level, where trends for individual variables were projected into the next 2 years using the most appropriate regression model for the individual series. After we had our input data ready, the process of predicting future happiness scores was quite straightforward. Finally, we created some visualizations to help us make sense of the predictions, which turned out quite sensible given the historical context.